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SECTION I 



INTRODUCTION 

Discrimination analysis has been developed through 
broad phases in much the same manner as the general 
history of statistical inference. There have been the 
Pearsonian phase with the introduction of the coef- 
ficient of racial likeness, the Fisherian phase con- 
nected with the linear discriminant function, the 
Neyman-Pear son phase with the introduction of the 
notions of risk and minimax, and the contemporary 
Waldian phase. Although the coefficient of racial 
likeness and generalized distance, proposed by Karl 
Pearson and P. C. Mahalanobis, respectively are sta- 
tistics to test the hypothesis of homogeneity, these 
statistics were the predecessors of discriminatory 
techniques. It was not until the middle 1930's that 
R. A. Fisher presented the first clear statement of 
the problem of discrimination and the first proposed 
solution to the problem. An excellent survey of the 
literature on discriminatory analysis and related topics 
has been compiled by J. L. Hodges in [4]. 

The general discrimination problem may be classi- 
fied into three principal types as follows: 

( 1 ) . A Finite Numbe r of Known Distributions - 
Let X be a random variable which is known to be dis- 
tributed according to one of a finite number of 
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distributions with known density functions, f j(x), 
j = 1, . .., m. On the basis of an observation on X, 

the problem is to determine which one of the m known 
distributions is the distribution of X. 

(2) . Finite Number of Parametric Fami lies of 

Distributions - Let X be a random variable which is 
known to have a distribution in one of a finite number 
of families of distributions. The distributions in 
the j-th family have density functions, f j( x > c{k), 
known form which depend upon the parameter cfK which 
lie in a parameter space fK, j = 1, ..., m. On the 

basis of an observation on X, the problem is to deter- 
mine which one of the j families of distributions is 
the distribution of X. 

(3) . Nonpar ametric - Let T be an individual 

which is known to belong to one of a finite number of 
populations, tt^ , j = 1, . .., m. To each individual 

there corresponds an observable value of a random 
variable which could be vector-valued. On the basis 
of a random sample of n^ individuals from population 
iTj , j = 1, ..., m, the problem is to decide which one 

of the m populations contains the individual T as a 
membe r . 

It may be that the only observation available is 
the observation on the random variable, X, to be 
classified, but usually, there are, in addition to the 
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observation to be classified, other observations 
available which can be used to estimate the distri- 
butions to which X is to be assigned. 

The nonparametric type of discrimination problem 
has received least attention to date. In [2], Hodges 
and Fix have considered the problem of nonparametric 
classification in the case of two populations and have 
developed procedures which were shown to have asymp- 
totic optimum properties for large samples. In [3], 
Hodges and Fix compared several of these nonparametric 
procedures against the linear discriminant function 
when the two populations are normal with equal covari- 
ance matrices. The linear discriminant function is a 
widely employed classification procedure, and therefore, 
it is of interest to determine the performance of this 
procedure when the populations are not gaussian. In 
[l], Thomas E. Eaton compared one of the nonparametric 
procedures proposed in [ 2 ] against the linear discri- 
minant function when the two populations were exponen- 

* 

tial. The basis of comparison in both [l] and [ 3 ] 
was the probability of misc lassif ication . This thesis 
is a continuation of the research started in [l]. 

Section II will summarize the procedures and re- 
sults of [31 as all of the procedures used in this 
paper are analogous. Section III provides a complete 
comparison of the probabilities of misc lassif ication 
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of a nonparamet ric procedure against the linear dis- 
criminant function when the two populations are exponen- 
tial. Section III also includes a limited tabulation 
of the probabilities of misclassif ication for the linear 
discriminant function when the two populations are gamma 
and one of the parameters has its domain restricted to 
the positive integers. Due to time limitation, it was 
not possible to determine a satisfactory computational 
formula to compute the probabilities of misc lassif ica- 
tion for the nonparametric procedure when the two popu- 
lations are gamma. Section IV contains conclusions 
and recommendations based on the results obtained in 
Section III. 

I am indebted to Professor J. R. Borsting for his 
encouragement and most capable guidance and advice 
while acting as faculty advisor, and wish to thank 
Professor R. R. Read for his valuable assistance and 
advice as second reader. Also, I wish to thank and 
acknowledge Mrs. Patricia Johnson for programming the 
procedures developed in Section III of this thesis. 
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SECTION II 



PERFORMANCE OF THE LINEAR DISCRIMINANT 
FUNCTION AND A NONPARAMETRIC DISCRIMINATOR 
WHEN THE TWO POPULATIONS HAVE NORMAL 
DISTRIBUTIONS WITH EQUAL COVARIANCE MATRICES 

Let X x , X 2 , . .., X n and Y ly Y 2 , . .., Y n be samples 
from the p-variate distributions F and G, respectively, 
and let Z be an observation known to be from either F 
or from G; on what basis is it decided to which popu- 
lation Z belongs? When F and G are p-variate normal 
distributions with equal covariance matrices, the 
linear discriminant function is known to be an approp- 
riate procedure. But what is a reasonable procedure 
when the parametric forms of F and G are not known? 

In [2], Hodges and Fix suggest, as an intuitive 
approach, the following nonparametric procedure: De- 

fine in p-dimensiona 1 space a notion of distance which 
will permit a ranking of the 2n observations according 
to their nearness to Z. Then select an odd integer, 
k, and assign Z to that distribution from which came 
the majority of the k nearest observations. Several 
classes of these nonparametric discriminators are 
shown to have asymptotically optimum performance in 
the sense that the probabilities of misc lassif ication , 
Pi = P[Z is assigned to G |Z came from F] 

P 2 = P[Z is assigned to F |Z came from G] 
tend, as n tends to infinity, to the theoretical 
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minimum values if F and G were completely known. 

Since it would not be reasonable to employ a non- 
parametric procedure solely on the basis of asymptotic 
properties and applicational simplicity, an investi- 
gation is made in [3] to determine how much discrimi- 
nating power is lost through the use of a nonpar ametric 
discriminator when samples are small. To this end, 
Hodges and Fix assume that F and G are normal with 
equal covariance matrices so that the linear discri- 
minant function is appropriate. Then a comparison of 
the probabilities of misc lassif ication , and P 2 , 

which result when the linear discriminant function is 
employed with the corresponding probabilities P x and P 2 
obtained when an alternate nonpar ametric discrimination 
procedure is used, indicates how much discriminating 
power is lost when sample sizes are small. The remain- 
der of this Section is devoted to summarizing some of 
the procedures and results of [3]. 

The principal distance function compared with the 
linear discriminant function is 

P . 

(1). A (x,z) = Max |x. - z. | 

i = l 

although A is just one of a large class of distance 
functions, anyone of which could be used. This fact 
is mentioned since the probabilities of error, P x and 
P 2 , depend very heavily on the distance function 
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employed. Also, a great part of the computations are 
made using k = 1, that is, assign Z to the population 
F or G from which came the individual of the pooled 
samples which most closely resembles Z. This case will 
be denoted the rule of the "nearest neighbor . " 

By considering linear transformations on the ob- 
servation space, the problem can be reduced considerably 
since it is always possible by such transformations to 
ensure F and G will have the identity covariance matrix. 
Thus, the p transformed measurements have unit variance 
and are independent in each population. Also, it is 
possible by such transformations to place the expecta- 
tion vector of F at the origin and the expectation 
vector of G on the positive first axis. In performing 
such linear transformations, the probabilities of 
misclassif ication , Pi and P 2 , are unchanged for both 
the nonpar ametric procedure and linear discriminant 
function. Thus without loss of generality, it is 
sufficient to consider the transformed populations 
with the two parameters, p and A., where 
A. = E(first coordinate of Y) 

= distance between the means of the 
transformed populations . 

Furthermore, from the symmetry of the problem it is 
evident that P x = P 2 for both procedures; consequently, 
it is sufficient to compute P 1? that is, assume Z is 
distributed according to F. 
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For the univariate case, p = 1, the linear dis- 
criminant function is greatly simplified since no 
matrix computation occurs. The procedure consists 
simply of computing the arithmetic mean of the sample 
means , 

X + Y 

2 ’ 

and assigning Z to that population whose sample mean 
lies on the side of (X + Y)/2 as does Z itself. The 
probabilities of misc lassif ication are readily com- 
puted by introducing two new variables which are func- 
tions of X, Y, and Z. The exact procedure is outlined 
in [3], but not included in this summary since the 
subsequent investigation does not depend upon this 
technique. Table 1 provides a tabulation of values 
of Pi = P 2 for various values of n and X. All tables 
in this section have been reproduced from [3]. 

For p = 1, the distance function A corresponds 
to ordinary Euclidean distance and the nonpar ametric 
procedure using the "rule of the nearest neighbor," 
k = 1, consists of assigning Z to that population from 
which came the sample individual nearest to Z. The 
probability, P 1} that the nearest neighbor to Z is one 
of the Y's, given that Z is distributed as X, is readily 
computed using the following technique. Define P x (z) 
to be the conditional probability that the nearest of 
the 2n sample observations to Z is a Y, given that 
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Z = z . Hence , 



( 2 ). 



Pi = E [Pj.(z)] = f(z) P i ( z )dz 



where f is the density function corresponding to F. 
Continuing exactly as in [ 3 }, it remains only to 
calculate P x (z). The event, "the nearest sample value 
to z is a Y" may be classified into n exclusive events, 
"the nearest sample value to z is , ” i = 1, 2, . n 
where the j Y^ - z| are independent identically 
distributed random variables. By defining 



H z ($) = P ( f X - z | < 8) 

and 

K z (S) = PC | Y - z| < 8), 

it is readily shown that the density function for the 
minimum of the | Y^ - z| , i = 1, 2, n is 



n [1 - K (8) ] n_ 1 dK (5) 
and that P].(z) can be computed by the formula 



(3). Pi (z ) = n / [l-H (S)] n [1-K (8)] n 1 dK (5) 

I £ z z 

J o 

Formulae (2) and (3) form the basis for all the compu- 
tations for the "nearest neighbor rule" for any p. 
Tables 2 and 2A provide a tabulation of P x = P 2 for the 
nonparametric discriminator, k = 1, for various values 
of n and X . 
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It was shown in [ 3 ] that for large n, 



00 




(z)f (z)dz 
(z)+g(z) 



The above formula was obtained from an expansion of 
formula (3) and is quite general. An application of 
Schwartz's inequality to formula (4), shows the 
integral can not exceed . 

Also investigated in [ 3 ] are the following addi- 
tional cases: 

(i) A nonparametric procedure using A as a dis- 
tance function with k > 2 for the univariate and bi- 
variate normal distributions. 

(ii) A nonparametric procedure using A as a 
distance function with k = 1, n = 1, and p >. 2. 

(iii) The effect of other distance functions on 
the probabilities of misc lassif ication for the bi- 
variate normal distribution. 

Due to laborious computations, the investigation 
of several of the above cases was quite limited, but 
the results that were obtained indicate that the non- 
parametric procedures gave "reasonable" error proba- 
bilities in cases (i) and (ii). Although for the 
bivariate normal distribution, different distance 
functions produced vastly different error probabilities 



in some instances. 



TABLE 1 



PROBABILITY OP ERROR, LINEAR DISCRIMINANT FUNCTION, 
UNIVARIATE NORMAL DISTRIBUTIONS 




n = size of sample taken from each population 
X * distance between the means of the two populations 
Probability of error = P (Z is assigned to G | Z came from P) 
= P (Z is assigned to F | Z came from C 
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T-T 1 



TABLE 2 



PROBABILITY OP ERROR, NONPARAMETRIC DISCRIMINATOR 
WITH k=l, UNIVARIATE NORMAL DISTRIBUTION 



n 


A =1 


A =2 


A =3 


i 


<>4175 


.2532 


.1235 


2 


<4086 


.2364 


0IO84 


3 


<4052 


c .2307 


d036 


k 


<4032 


O 2280 


.1014 






TABLE 2 “ 


-A 




APPROXIMATE PROBABILITY OP 


ERROR, NONPARAMETRIC 


DISCRIMINATOR 


WITH k=l, UNIVARIATE NORMAL 


DISTRIBUTION 




n 


A=i 


A=2 


A =3 




<403 


o226 


.102 


5 


401 


o 225 


.100 


10 


.399 


.223 


.098 


20 


o398 • 


0 224 


.098 


50 


.398 


0 225 


.098 


00 


o398 


<>225 


.098 



n - size of sample from each population 
A = distance between the means of the two populations 
Probability of error = P(Z is assigned to 0 | Z came from F) 
- P(Z is assigned to P | Z came from G) 
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SECTION III 



PERFORMANCE OF THE LINEAR DISCRIMINANT 
FUNCTION AND THE ’’RULE OF NEAREST NEIGHBOR” 

WHEN THE TWO POPULATIONS HAVE GAMMA DISTRIBUTIONS 

The validity of the linear discriminant function 
when the data is obviously not normal has been of 
great concern to many users and also potential users 
of this discrimination procedure. In [1 ], T. E. 
Eaton investigated the performance of the linear dis- 
criminant function and a nonpar ametric procedure for 
sample size one and two when the univariate distribu- 
tions, F and G, are assumed to be exponential with 
parameters X and |_i respectively. This investigation 
was performed by computing the probabilities of mis- 
classif ication. The results of this study showed 
that both the linear discriminant function and non- 
parametric discriminator using i\ as a distance func- 
tion and "the rule of nearest neighbor” can give high 
probabilities of misc lassif ic ation for sample size 
one and two. In this section, the investigation 
started in [ 1] is continued in order to provide a 
limited indication of how much discriminating power 
the linear discriminant function and ’’rule of nearest 
neighbor” have when the populations are not normal. 

The scope of the present study is an investiga- 
tion of the probabilities of misclassif ication , 
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Pi = P [Z is assigned to G |Z came from F] 

P 2 = P [Z is assigned to F j Z came from Gj , 
for the two population classification problem when 
the following two procedures are employed: 

(i) The nonpar ametric procedure employing a as 

a distance function and using the ’'rule of the nearest 
neighbor,” k = 1, when F and G are exponentially dis- 
tributed with parameters A. and p, respectively, and 
X = cp where c is greater than zero. 

(ii) The linear discriminant function when F and 
G have gamma distributions with parameters (r, X) and 
(r, p) respectively, where r is a positive integer, 
and, as above, X = cp where c is greater than zero. 

The density functions of F and G will be denoted 
by f(x;r,cp) and g(y;r,p) respectively where 

( 'i r r ~ 1 

(5) . f(x;r,cp) = - C M- . exp(-cpx) 

T(r) 

and 

r r- 1 

(6) . g(y;r,p) = XL_X exp(-py) 

Hr) 

Obviously, when r = 1 in formula (5) and (6) above, 
f (x; 1, cp) and g ( y ; 1, p) are exponential. 

A computation formula for the error probabilities, 
Pi and P 2 , will be developed first for the ’’rule of 
nearest neighbor," procedure (i) above. This procedure 
consists of assigning Z to that population from which 
came the sample individual nearest to Z. 
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Assuming equal samples, say n, are available from 



each population, it is observed that the folowing 
relation, 

(7). PxCn, c ) = P 2 (n,l/c) 

exists between the error probabilities when F and G 
have gamma distributions with density functions defined 
by formulas (5) and (6); hence, this relationship exists 
when F and G are exponential . Using exactly the same 
technique as was outlined in Section II, it is ob- 
served that if Z = z, and 6 > 0, then 



It follows from formulas (2) and (3) of Section II 
that 



Hence, by the simple change of variables, 6' = c6, z' = 
cz, y' = cy and x 1 = cx, it follows that 



H z (6) = P( | X-z | < 6) = { 





g(y ; r ,p)dy, if 6 > z 



K z (<S) = P( | Y-z | < 6) = ( 





[ 1-H ( 6 ) 3 n [ 1-K ( 5 ) ] n_1 dK (6) 
z z z 

f Cl-H z (6)'J n Cl-K z (5 )3 n_1 dK z (6). 
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I 















00 



00 



P x ( n , c ) = n J f(z;r,p)dz J C 1 -H z (5)3° [ 1-K z ( S ) j n “ 1 dK z (S) 
+ n j f (z ; r ,p)dz j [l-H z (6)] n [l-K z ( S )] n_1 dK z (S) 



o 



o 



= P 2 (n, 1/c ) 



Unfortunately, it was only possible to determine a suit- 
able computational formula for P 1 (n,c) when F and G 
are assumed to have exponential distributions. A pre- 
liminary survey indicated that a large computational 
program would be required if F and G are assumed to 
have the gamma distributions defined at the beginning 
of this section. 

When F and G are assumed to be exponential, a 
suitable computation formula for P 1 (n,c) is obtained 
as follows: First, let z' = pz , 6' = pS, integrate 

and combine terms to obtain 



Pi(n,c) (c+l)(2nc+2n+c) 

r r 7 - 

2nc I exp(-cz-z)dz I [l-2exp(-cz) sinh c8] , 



Cl-2exp(-z) sinh 8] n cosh 8 d8 



Then by interchanging the order of integration and 

expanding both ["l-2exp( -cz ) sinh c8] n and 

n 1 

Cl-2exp(-z) sinh 8j~ into binomial series, it can 
be shown that 
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Pi ( n , c ) = 



Cc+ 1 )( 2nc+2n+c ) 
n n — 1 

+ nc 



n 

k 



n - 1 j 1 

j I (ck+ j+c+1) 



k=0 



J 



,j=0 



i=0 

where F 



(i) ( - 1)£ ) 


1 j 1 


(-D p 


... i . i 


t p , 


F. . . F. . . +2 

k,j,i,p k , j ,i ,p 


1 L i 





= (2ck+2j-2ci-2p+c) 



k, j,i,P 

Since P x (n,c) = Pi(n,l/c), Table 3 provides PiCrijC) for 
c = 1,2,3,4,10,20 and the reciprocals for a wide range 
of values of n. They by utilizing formula (4) of 
Section II, it is possible to obtain a reasonable upper 
bound for P!(n,c) as n tends to infinity. To begin 
with, it is observed that P x * (c), where P x * (c) is 
defined as 



Pi*(c) = Lim P 1 (n,c) = 
n->°° 



c exp(-cz)dz 
c exp( -xc + z .)+ 1 ’ 



has by Schwartz's inequality an upper bound of . A 
better upper bound can be obtained for c >. 5 and c < 1/5 
by noting that 

c exp(-cx) < c exp(-xc+x) 
c exp( -xc+x,) + l c exp(-xc+x)+l 

for 0 < x < 00 and c > 0; hence, for c > 1, integra- 
tion yields 

„ ^ N ^ / c exp(-xc+x)dx _ ln( c+ 1 ) 

Pl * (c) ^ c exp( -xc+xT+T ~ " (c-l) ’ 



therefore , 



o 
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Pi*(c) < ^('c- T T ' ^ or c > 0 

since it is evident from formula (4) that 
P x *(c) = Pi*(l/c) = P 2 *(c) = P 2 *(l/c). Table 3 contains 
limiting probabilities, P*, which were computed by numeri- 
cal integration using Simpson's rule. 

The result that the ’’rule of nearest neighbor" will 
have, as n tends to infinity, limiting probabilities of 
error of at most £ is particularly interesting since, as 
will be shown, no such general statement can be made for 
the linear discriminant function when the populations are 
characterized by exponential distributions. Considering 
now the linear discriminant function for the case when 
the populations, F and G, are assumed to have gamma 
distributions, a computational formula will be developed 
for the probabilities of misc lassif ication . Again, it 
will be assumed that the samples available from each 
population are equal. Since this procedure consists of 
computing the arithmetic mean, (X+Y)/2, of the sample 
means and assigning Z to that population whose sample 
mean lies on the side of (X+Y)/2 as does Z itself, the 
error probability, P x , is committed if and only if 

Z > (X+Y)/2 and Y > X 
or 

Z < (X+Y)/2 and Y < X. 

Thus, by the definition of P x it follows that 
P x = P[Z > (X+Y)/2,Y > X] + P[Z < (X+Y)/2, Y < X]. 
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For the purpose of convenience, it is desirable to 
define two new random variables, S and T, where S = nX 
and T = nY. Let the density functions of S and T be 
denoted by f(s;nr,cu) and g(t;nr,u), respectively. 

The probability, P l9 can now be expressed more con- 
veniently, as 

P x Cn,c) = P[Z > (S+T)/2n,T > S7 + P[Z < (S+T)/2n,T < S] 

- 00 CD CD 

f(s;nr,cp)ds I g(t;nr,p)dt I f(z;r,cu)dz 

Js 

c s c 

+ I f(s;nr,cp)ds fg(t;nr,p)dt lf(z;r,cp)dz 

As in the "rule of nearest neighbor" procedure, it can 
easily be shown by the following change of variables, 
z' = cz, t' = ct, and s' = cs , that the relationship 
between P x and P 2 is again given by PiCiijc) = P 2 (n,l/c). 

Since P^n^) = P 2 (n,l/c), it is sufficient to 
obtain a computation formula for Pi(n,c). The methods 
employed to obtain this formula are now outlined. First, 
it is observed that Pj.Cn.jC) can be expressed as 

r r 

Pi(n,c) = / f(s;nr,cp)ds / g(t;nr,p)dt 

Jo Jo 



+2 j f ( s ; nr , cu )ds / g( t ; nr , p )dt 
. 00 

f(s;nr,cp)ds I g(t;nr,p)dt 



f (z ; r , cp )dz 
If* +*)/**) 

30 

f (z ; r , cp)dz 
\(&+: 
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Now by utilizing the well known integration by parts 
formula, n _l 

n-1 



TTnJ 



exp(-ax)dx = -exp(-ax) 




u n - k rck+i) 



it can be shown that 

nr-1 



k=0 




nr + i- 1 



(nr+k + j-i-1) I 



j=0 
nr + r 



j:[l + c/(2n)] nr + 1 -j [l + c+c/n] nr + k + ^ -:l 
r-1 



[(nr-1)!] 2 / . (2n) k c r “ k 



k=0 



(nr+i-l)!(nr+k-i-l) 



il(k-i)l[l+c/(2n)] nr+1 [c+c/(2n)] 



,nr+k-i 



i = 0 
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Table 4 provides a tabulation of the probabilities of 
misclassif ication, P^^c) = P 2 (n,l/c), for r equals 
1 through 20, c=l, 2, 3, 4, 5, 10, 20 and the recip- 
rocals, and a fairly wide range of values for n. 

The probabilities of misclassif ication for the 
linear discriminant function were also examined when 
unequal samples were available from the populations 
F and G for the special case when r = 1. Using tech- 
niques analogous to those described in the preceding 
paragraph, it is observed that for samples of size n 
and m from the populations described by the distribu- 
tions of F and G respectively, the relationship 
between P x and P 2 is 

Pi( j=n,i=m,c) = P 2 ( j=m,i=n, 1/c) 

where 



PiC j=n,i=m,c) = 1 - 



[l + l/(2j)} J [l + c/(2i)] 1 
i - 1 

/ \ 

1 



[ l+i/( jc)] J 



2c - 




j+k-1 
k 



[ l + ( jc )/i ] k 



k=0 



[l+c/(2i)] 1 [c/j + c + i/j"l J 



i - 1 



k=0 



j+k- 1| ( l + c/2 ) K 
( jc+c+i ) k 



Although a tabulation of the error probabilities, P t 
and P 2 , when the sample size is not equal, would be of 
some value and interest, time limitations precluded 
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the computation of a table which would enumerate these 
probabi lities . 



In the special case of r = 1 , it was possible to 
determine the limiting probabilities of misc lassif ic a- 
tion. The procedure for obtaining the limiting proba- 
bilities is briefly outlined. When r = 1, the distri- 
butions F and G are exponential, and P x can be expressed 

as oo g 

Pi(n,c) = l/q( n, c ) + I f (s ; n , cp)ds j g ( t ; n , p )dt 

"X> Jo 



-2 £ (s;n,cp)exp[-cps/(2n)]ds / g ( t ; n ,p)exp [ -cpt/( 2n) ]dt 

-o Jo 

which by the change of variables, s' = cp( 2n+l )s/( 2n ) 
and t' = p ( 2n+c )t/( 2n ) for the integral appearing first 
in the above expression for P x (n,c) and t' = pt and 
s' = cps for the second integral, yields 

r” r s / c 

P 1 (n,c) = l/q(n,c) + /f(s;n,l)ds / g(t;n,l)dt 

Jo Jo 

r m rh(s) 

- 2/q(n,c) / f (s ; n , 1 )ds / g(t ;n,l)dt 



where 

h(s) = (2n+c)s/(2nc+c) 
and 

q(n,c) = [ l + l/( 2n) ] n [ l + c/(2n) ] n . 
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Now, if the simple one-one transformation, 
x = s /(s+t) 
y = s + t 

is utilized, the above expression for Pi(n,c) becomes 
Pi(n,c) = l/q( n , c ) 



+ l/r 2 (n) / y^ n_ ^expC -y )dy I x ll ~ ± (l-x) u ' L dx 



n-1/, -\n-l 



Jo 



c/c+1) 



2n- 1 



n- 1 . . .n-1 



- 2/[q(n,c)F (n)] / y exp(-y)dy / x ~ (1-x) dx, 

t (n, c ) 

which upon integrating out y, can be expressed as 



(8). P x (n,c) = l/q(n,c) + l/B(n,n) I x 11 *(l-x) n ^dx 

c/(c+l) 



•2/[q(n , c )B(n, n) ] / x 11-1 (l-x) n-1 dx 



(n, c) 



where 

t(n,c) = ( 2nc+c )/(2n+2c + 2nc ) 



and 

B(n,n) = r^(n)/T(2n). 

Since it is evident when c = 1 that Pi(n,c) = •§■ for 
all n, it remains only to consider the cases, 0 < c < 1 
and c > 1. By considering each case separately and 
applying Chebyshev's inequality to formula (8), the 
limiting probability of P x (n,c) is 
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(9). P!*(n,c) 



1 



Lim 
n-» ® 



PiCn,c)- < 



exp[-(c+l)/2] , 
if c = 1 



if 0<c<l 



exp[-(c+l ) /2 ] , if c > 1 

As mentioned previously, the limiting probabilities for 
the nonparametric discriminator, "rule of nearest 
neighbor," are at most ■§-, but from formula (9) it is 
apparent that the limiting probabilities for the linear 
discriminant function are greater than j- for 
[ 2( In 2)-l] < c < 1. 
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•' , f 

. . ' ; 

TABLE 3 * 



ERROR PROBABILITIES 
RULE OF NEAREST NEIGHBOR 



EXPONENTIAL 



n\c 


; l.o 


2.0 


3.0 


1 


.5000 


.4000 


.3262 


2 


.5000 


.4222 


.3560 


3 


.5000 


.4317 


.3691 


4 


.5000 


.4368 


.3761 


5 


.5000 


.4399 


.3804 


6 


.5000 


.4419 


.3833 


7 


.5000 


.4434 


.3853 


CD 


.5000 


.4445 


.3868 


9 


.5000 


.4453 


.3879 


10 


.5000 


.4460 


.3888 


15 


.5000 


.4478 


.3913 


20 


.5000 


.4487 


.3925 


00 


.5000 


.4507 


.3954 



POPULATIONS 



4.0 


5.0 


10.0 


20.0 


. 2741 


.2359 


.1385 


.0757 


. 3067 


.2693 


.1676 


.0957 


. 3215 


.2850 


.1831 


.1077 


.3297 

\ 


.2938 


. 1924 


. 1155 


. 3347 


.2992 


. 1985 


.1209 


. 3380 


.3029 


.2027 


.1248 


.3404 


.3055 


.2057 


. .1278 


. 3421 


.3074 


.2080 


.1301 

t 


. 3435 


.3089. 


.2098 


.1319 


. 3445 


.3100 


.2112 


.1333 


. 3475 


.3134 


.2152 


.1377 


. 3489 


.3149 


.2171 


.1398 


.3524 


.3188 


.2217 


. 1447. 



i 

y 



i 



i 

> • t 



f I 

. 

I ^ 

I ' 

n\c .5000 

. . 

1 .5333 

2 .5003 

3 .4856 

4 .4773 

5 .4719 

6 .4682 

7 .4655 

8 .4634 

9 .4618 

10 .4605 

20 , .4547 

» .4507 



TABLE 4 



ERROR PROBABILITIES 
RULE OF NEAREST NEIGHBOR 
EXPONENTIAL POPULATIONS 


.3333 


.2500 


. 2000 


.5214 


.5037 


.4870 


.4666 


.4340 


. 4068 


.4426 


.4043 


.3733 


.4294 


.3884 


. 3558 

\ 


.4212 


.3788 


.3455 • 


.4157 


.3725 


. 3389 


.4118 


.3683 


.3345 


.4089 


.3652 


.3313 


.4067 


.3629 


. 3290 


.405.0 


.3612 


. 3273 


.3984 


.3549 


.3212 


.3954 


.3524 


. 3188 



i 




\ 



. 1000 


.0500 


.4329 


.3907 


.3277 


.2714 


.2858 


.2239 


.2647 


.1997 


.2527 


.1854 


.2451 


.1761 


.2400 


.1698 


.2364 


.1652 


.2338 


.1617 


.2319 


.1592 


.2247 


. 1492 


.2217 


.1447 




y 
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TABLE 4 






LINEAR C I SCR I M 1^ ANT FUNCTION 
ERROR PROBA3 ILITIES 
FOR GAMMA POPULATIONS 



R = 1 



n\c 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


1 


. 5000 


.4000 


.3262 


. 2741 


.2359 


. 1385 


2 


.5000 


.3637 


.2652 


. 2006 


.1567 


.0627 


3 


.5000 


.3391 


.2292 


.16 19 


.1188 


.0366 


4 


.5000 


.3207 


.2058 


. 1393 


.0984 


.02 54 


5 


.5000 


.3062 


.1898 


. 1252 


.0864 


.0197 


6 


.5000 


.2945 


.1784 


.1161 


.0790 


.0164 


7 


.5000 


.284 8 


.1702 


. 1099 


.0741 


.0142 


8 


.5000 


.2768 


.1642 


.1056 1 


.0706 


.0127 


9 


.5000 


• .2701 ■ 


.1597 


. 1024 


.0681 


.0115 


10 : 


.5000 


.2643 


.1562 


. 1000 


.0661 


.0106 


15 


.5000 


.2460 


.1473 


. 0936 


.0606 


.0082 


20 


.5000 


.2369 


.1439 


. 0907 


.0579 


.0070 


25 


.5000 


.2322 


.1421 


. 0890 


.0563 


.0064 

5 


30 


.5000 


.2296 


.1409 


. 0879 


.0552 


.0060 


35 


.5000 


.2280 


.1401 


. 0870 


.0544 


.0057 


40 


.5000 


.2271 


. 1395 


. 0864 


.0538 


.0055 


50 


.5000 


.2260 


.1387 


. 0856 


.0530 


.0052 


60 


.5000 


.2255 


.1381 


. 0850 


.0525 


.0050 


70 


.5000 


.2251 


.1377 


. 0846 


.0521 


.0049 


80 


.5000 ‘ 


.2249 


.1374 


. 0843 


.0518 


.0043 


90 


.5000 


.2247 


.1372 


. 0840 


.0516 


. .0047 


100 


.5000 


.2245 


. 1370 


. 0630 


.0514 


.0046 


00 


.5000 


.2231 


.1353 


. 0821 


.0498 


.0041 



2C.C 

.0757 
.0209 
.0083 
.0043 
.0026 
.0017 
.0012 
.0009 
.0007 
.0006 
.0003 
.0002 
.0001 
.0001 
.0001 
.0001 
.0001 
.0001 
.0CC1 
.0000 
.0000 
. 0000 
.0000 
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TABLE 4 



LINEAR CISCPIMINANT FUNCTION 
ERROR PROBABILITIES 
FOR G ANNA POPULATIONS 



R = 1 



N\C 


.5000 


.3333 


.2500 • 


.2000 


. 1C0C 


.0500 


1 


.5333 


.52 14 


.5037 v. 


.4870 


.4329 


.3907 


2 


.5299 


.5041 


.4782 


. 4577 


.4076 


.3813 


3 


.5278 


.4947 


.4666 


. 4469 


.4052 


.3866 


4 


.5265 


.4893 


.4613 


. 4429 


' .4072 


.3912 


5 


.5256 


.4860 


.4588 


.44 19 


.4096 


.3944 


6 


.5251 

K 


.4841 


.4578 


.4419 


.4115 


.3966 


7 


.5247 


.4829 


.1*576 


. 4 425 


.4131 


.3982 


8 


.5245 


.4823 


.4577 


. 4431 


.4142 


.3995 


9 


.5244 


.4819 


.4580 


. 4438 


.4152 


.4004 


10 


.5243 


.4818 


.4584 


.4444 


.4160 


.4012 


15 


.5244 


.4823 


.4601 


. 4465 


.4182 


.4036 


20 


.5247 


.4831 


.4612 


. 4477 


.4195 


.4 048 


25 


.5250 


.4838 


.4619 


. 4484 


.4202 


.4055 


30 


.5254 


.4842 


.4624 


.4488 


.4206 


.4060 


35 


.5256 


.4846 


.4627 


. 4492 


.4210 


.4063 


40 


.5258 


.4848 


.4630 


. 4494 


.4212 


.4066 


50 


.5262 


.4852 


.4633 


.4498 


.4216 


.4070 


60 


.5264 


.4854 


.4636 


. 4500 


.4218 


.4072 


70 


.5266 


.4856 


.4637 


. 4502 


.4220 


.4 074 


80 


.5267 


.4857 


.4639 


. 4503 


.4221 


.4075 


90 


.5268 


.4858, 


.4640 


. 4504 


.4222 


.4076 


100 


.5269- 


.4859 


.4640 


. 4505 


.4222 


.4077 


00 


.5277 


.4833 


.4647 


.4512 


.4231 


.4084 
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TABLE 4 






LINEAR CISCRIMIM ANT FUNCTION 
ERROR PROBABILITIES 
FCR GAMMA POPULATIONS 



R = • 2 



n\c 


1.0 


2.0 


3.0 


4.0 


5.0 


' 10.0 


20.0 


1 


.5000 


.3598 


.2532 


. 1851 


. 1404 


.0512 


.0158 


2 


.5000 


.3127 


. 1839 


. 1 133 


.0733 


.0138 


.0017 


3 


.5000 


.2836 


.1508 


. 0850 


.0505 


.0063 


.0004 


4 


.5000 


.2639 


. 1329 


. 0717 


.0406 


.0033 


.0001 


5 


.5000 


.2498 


.1226 


. 0644 


.0353 


.0026 


.0001 


6 


.5000 


.2396 


.1162 


. 0600 


.0320 


.0020 


.0000 


7 


.5000 


.2319 


.1120 


. 0571 


.0298 


.0016 


.0000 


8 


.5000 


.2261 


.1090 


. 0549 


.0281 


.0013 


.0000 


9 


.5000 


.2217 


.1069 


. 0533 


.0269 


.0011 


.0000 


10 


.5000 


.2182 


.1052 


. 0520 


.0259 


.0010 


.0000 


15 


.5000 


.2092 


.1006 


. 0481 


.0229 


.0006 


.0000 


20 


.5000 


.2058 


.0984 


. 0462 


> .0215 


.0005 


.0000 


25 


.5000 


.2042 . 


.0970 


. 0450 


.0207 


.0004 r 


.0000 


30 


.5000 


.2033 


.0961 


. 0443 


.0201 


.0004 


.0000 


35 


.5000 


.2027 


.0955 


. 0437 


.0197 


.0003 


.0000 


40 


.5000 


.2022 


.0950 


. 0433 


.0194 


. 0003 


.0000 


50 


.5000 


.2016 


.0943 


. 0427 


.0190 


.0003 


.0000 


60 


.5000 


.2012 


.0939 


. 0423 


.0187 


.0003 


.0000 


70 


.5000 


.2009 


.0935 


. 0421 


.0185 


.0003 


.0000 


80 . 


.5000 


.2007 


.0933 


.0419 


.0184 


.0003 


.0000 


90 


.5000 


.2005 


.0931 


. 0417 


.0183 


.0003 


.0000 


100 


.5000 


.2004 


.0929 


. 0416 


.0162 


.0002 


.0000 
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TABLE 4 



LINEAR DISCRIMINANT FUNCTION 
ERROR PROBABILIT IES 
FOR GAMMA POPULATIONS 



R = 2 



n\c 


.5000 


.3333 


.2500 


. 2000 


.1000 


.0500 


1 


.4999 


.4487 


.4071 


. 3770 


.3109 


.2807 


2 


.4782 


.4102, 


.3678 


. 3423 


.297~6 


.2794 


3 


.4658 


.3946 


.3566 


. 3354 


.2979 


.2806 


4 


.4580 


.3877 


.3533 


. 3343 


.2986 


.2812 


C . 


.4527 


.3846 


.3526 


. 3344 


.2991 


.2815 


6 


.4491 


.3832 


.3526 


.. 3348 


.2994 


.2817 


7 


.4466 


.3827 


' .3528 


. 3351 


.2997 


.2819 


8 


.4448 


.3826 


.3530 


. 3354 


.2999 . 


' .2820 


9 


.4435 


.3826 


.3533 


. 3356 


.3000 


.2821 


10 


.4426 ' 


.3827 


.3535 


. 3358 


.3001 


.2821 


-15 


.4409 


.3833 


.3541 


.3363 


.3004 


.2823 


20 


.4407 


.3837 


.3544 


. 3366 


.3005 


.2824 


25 


.4409 


.3840 


.3546 


. 3367 


.3006 


. 2824 


30 


.4410 


.3841 


.3547 


. 3368 


.3007 


.2825 


35 


.4412 


.3842 


.3548 


. 3369 


.3007 


.2825 


40 


.4413 


.3843 


.3549 


. 3370 


.3008 


.2825 


50 


.4415 


.3845 


.3550 


. 3371 


. 3008 


.2825 


60 


.4416 


.3845 


.3550 


. 3371 


. 3008 


.2826 


70 


.4417 


.3846 


.3551 


. 3371 


.3008 


.2826 


80 


.4417 


.3846 


.3551 


. 3372 


.3009 


• .2826 


90 


.4418 


.3847 


.3552 


» 3372 


.3009 


.2826 


100 


.4410 


.3847 


.3552 


. 3372 


.3009 


.2826 



30 



I 

I 

i. 






TABLE 4 



LINEAR C I SCR IM IN ANT FUNCTION 
ERROR PROBABILITIES 
FOR GAMMA POPULATIONS 



R = 3 



i 




, / . 



f\c 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 

1 


1 


.5000 


.3266 


. 1998 


. 1278 


.0859 


.0197 


.0035 


2 


.5000 


.2719 


.1317 


.0675 


.0368 


.0034 


.0002 


3 


.5000 


.2412 


. 1046 


. 0485 


.0237 


.0012 


.0000 


4 


.5000 


.2223 


.0918 


. 0403 


, .0184 


.0006 


.0000 


5 


.5000 


.2100 , 


.0849 


. 0359 


.0156 


.0004 


.0000 


6 


.5000 


.2018 


.0807 


. 0331 


.0138 


.0003 


.0000 


7 


.5000 


.1961 


.0779 


. 0312 


.0126 


.0002 


.0000 


8 


.5000 


.1920 


.0758 


. 0298 


.0117 


.0001 . 


.0000 


9 


.5000 


.1891 


.0743 


. 0287 


.0110 


.0001 


.0000 


10 


.5000 


.1869 


.0730 


. 0278 


.0105 


.0001 


.0000 


15 


.5000 


.1815 


.0694 


. 0252 


.0090 


.0001 


.0000 


20 


.5000 


.1794 


.0675 


. 0240 


.0083 


.0000 


.0000 


25 


.5000 


.1782 


.0664 


.0232 


.0078 


.0000 


.0000 


30 


.5000 


.1774 


.0657 


. 0227 


.0076 


.0000" 


.0000 


35 


.5000 


.1769 


.0651 


. 0224 


.0074 


.0000 


.0000 


40 


.5000 


. .1765 


.0648 


. 0221 


.0072 


.0000 


.0000 


50 


.5000 


.1759 


.0642 


.0217 


.0070 


.0000 


.0000 


60 


.5000 


.1755 


.0638 


. 0215 


.0069 


.0000 


.0000 


70 


.5000 


.1752 


.0636 


.0213 


.0068 


.0000 


.0000 


80 


.5000 


.1750 


.0634 


. 0212 


.0067 


.0000 


.0000 


90 


.5000 


.1749 


.0632 


.0211 


.0067 


.0000 


.9000 


100 


.5000 


.1747 


.0631 


.0210 


.0066 


• 0000 


.0000 



31 



1 



TABLE 4 



LINEAR DISCRIMINANT FUNCTION 
ERROR PROBABILITIES 
FOR GAMMA POPULAT ICNS 



R = 3 



n\c 


.5000 


.3333 


• 25C0 


. 2 000 


. 1C0C 


.0500 


1 


• 466C 


.3876 


.3361 


. 3C4 1 


.2471 


.2260 


2 


.4326 


.3443 


.3004 


.2770 


.2374 


.2199 


3 


.4154 


.3307 


.2931 


. 2728 


.2353 


.2173 


4 


.4056 


.3260 


.2913 


.2718 


.234 1 


.2158 


5 


.3997 


.3242 


• 29C8 


. 2713 


.2333 


.2148 


6 


.3960 


.3236 


.2905 


. 2710 


.2328 


.2141 


7 


.3937 


.3234 


.2904 


. 2708 


.2324 


.2136 


8 


.3923 


.3233 


.2903 


. 2707 


.2321 


.2132 


9 


.3913 


.3233 


.2902 


.2705 


.2318 


.2129 


10 


.3908 


.3233 


.2902 


. 2704 


.2316 


.2126 


15 


.3899 


.3233 


.2900 


.2701 


.2310 


.21 19 


20 


.3900 


.3233 


.2899 


. 2699 


.2307 


.2115 


25 


.3901 


.3233 


.2898 


. 2698 


.,2305 


.2112 


30 


.3902. 


.3233 


.2898 


. 2698 


.2303 


.2111 


35 


.3903 


.3233 


.2897 


.2697 


.2302 


.2109 


40 


.3903 


.3233 


.2897 


. 2697 


.2302 


.2108 


50 


.3904 


.3233 


.2897 


.2696 


• .2301 


.2107 


60 


.3904 ' 


.3233 


.2897 


. 2696 


.2300 


.2106 


70 


.3905 


.3233 


.2896 


.2695 


.2299 


.2106 


80 


.3905 


.3233 


.2896 


. 2695 


.2299 


.2105 


90 


.3905 


.3233 


.2896 


.2695 


.2299 


.2105 


00 


.3905 


.3233 


.2896 


. 2695 


.2298 


.2105 
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TABLE 4 



LINEAR CISCRIMHANT FUNCTION 
ERROR PROBABILITIES 
FOR GAMMA POPULATIONS 



R = • 4 



n\c 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


1 


.5000 


.2974 


.1508 


. 0892 


.0533 


.0078 


2 


.5000 


.2379 


.0964 


. 0417 


.0193 


.0009 


3 


.5000 


.2076 


.0750 


. 0289 


.0117 


.0003 


4 


.5000 


.1904 


.0656 


.0235 


.0087 


.0001 


5 


.5000 


.1801 


.0606 


. 0206 


.0071 


.0001 


6 


.5000 


.1736 


-.0574 


.0187 


.0061 


.0000 


7 


.5000 


.1693 


.0552 


.0174 


.0055 


.0000 


8 


.5000 


.1663 


.0536 


.0165 


.0050 


.0000 


9 


.5000 


.1642 


.0524 


.0158 


.0046 


.0000 


10 


.5000 


.1626 


.0514 


.0152 


.0044 


.0000 


15 


.5000 


.1586 


.0484 


. 0135 


.0036 


.0000 


20 


.5000 


.1567 


.0469 


.0127 


' .0032 


.0000 


25 


.50.00 


. .1556 ; 


.0460 


.0122 


.0030 


.0000, 


30 


.5000 


.1549 


.0454 


.0119 


.0029 


.0000 


35 


.5000 


.1544 


.0449 


.0117 


.0028 


.0000 


40 


.5000 


.1540 


.0446 


.0115 


.0027 


.0000 


50 


.5000 


.1534 


.0442 


.01 13 


.0027 


.0000 


60 


.5000 


.-1531 


.0439 


.0111 


.0026 


.0000 



/ 

, / . 

1 /> - 

20.0 

.0008 
.0000 
.0000 
.0000 
.0000 
.0000 
.0000 
.0000 
. .0000 
.0000 
.0000 
.0000 
.0000 
.0000 
.0000 
.0000 
.0000 
.0000 



33 



> 

\ 



TABLE U, 

LINEAR (DISCRIMINANT FUNCTION 







ERROR PROBABILITIES 
FOR GAMMA POPULAT ICNS 












P = 4 








n\c 


. 5C00 


.3333 


• 25C0 


• .2000 


. 1000 


.0500 


1 


.4345 


.3383 


.2844 


. 2543 


.2062 


.1885 


2 


.3937 


.2967 


.2547 


. 2330 


. 1951 


.1777 


3 


.3748 


.2861 


.2493 


. 2290 


.1910 


.1730 


4 


.3650 


.2828 


.2475 


.2273 ' 


. 1887 


.1704 


5 


.3597 


.2815 


.2465 


. 2262 


.1872 


.1688 


6 


.3568 


.2809 


.2459 


. 2254 


.1862 


.1676 


7 


.3551 


. .2806 


.2454 


. 2249 


.1855 


.1668 


e 


.3541 


.2803 


.2451 


. 2245 


.1849 


.1661 


9 


.3536 


.2802 


.2448 


. 2241 


.1845 


.1656 


10 


.3532 


.2800 


.2446 


. 2239 


.1841 


.1652 


15 


.3528 


.2795 


.2439 


. 2230 


.1830 


.1640 


20 


.3528 


.2793 


.2435 


. 2226 


. 1824 


.1633 


25 


.3528 


.2792 


.2433 


. 2223 


. 1821 


.1630 


30 


.3528 


.2791 


.2432 


. 2222 


.1818 


.1627 


35 


.3528 


.2790 


.2431 


. 2220 


.1817 


.1625 


40 


.3528 


.2789 


.2430 


.2219 


.1815 


.1624 


50 


.3528 


.2789 

\ 


.2429 


.2218 


. 1814 


• 1622 


60 


.3528 


.2788 


.2428 


• 2217 


.1812 


.1620 



i 

I 



TABLE 4 



LINEAR DISCRIMINANT FUNCTION 

ERROR PROBABILITIES , 

FOR GAMMA POPULATIONS ' / ' 









R 


= 5 






i 

t 


n\c 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 


.5000 


.2714 


.1269 


. 0629 


.0335 


.0031 


.0002 


2 


.5000 


.2093 


.0718 


. 0264 


.0105 


.0002 


.0000 


5 


.5000 


.1565 


.0439 


.0120 

\ 


.0033 


.0000 


.0000 


10 


.5000 

• 


.1426 


.0365 


. 0084 


.0018 


.0000 


.0000 



i 



f\c 


1.0 


2.0 


R 

3.0 


= 6 

4.0 


5.0 


y 

10.0 


20.0 


1 


.5000 


.2480 


.1019 


. 0446 


.0213 


.0013 


.0000 


2 


.5000 


.1850 


.0542 


.0170 


..0058 


.0001 


.0000 


5 


.5000 


.1374 


.0321 


.0071 


.0015 


.0000 


.0000 


10 


.5000 


.1256 


*0261 


.0047 


.0008 


.0000 


.0000 
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TABLE 4 






LINEAR DISCRIMINANT FUNCTION 





< 


ERROR PROBABILIT IES 
- FOR GAMMA POPULATIONS 












R = S 








n\c 


.5000 


.3333 


.2500 


. 2000 


.1000 


.0500 


I 


.4057 


.2984 


.2455 


.2181 


.1759 


.1596 


2 


.3605 


.2607 


.2208 


. 2000 


. 1627 


.1458 


5 


.3283 


.2482 


.2121 


.1914 

\ 


.1527 


.1349 


10 ' 


.3236 


.2459 


.2093 


. 1882 


. 1488 


.1307 


50 


.3227 


.2440 















¥ 


R = 6 






3 


n\c 


.5000 


.3333 


.2500 


. 2000 


.1000 


.0500 


1 


.3794 


.2658 


.2153 


. 1904 


.1519 


.1363 


2 


.3320 


.2321 


.1938 


. 1735 


.1371 


.1208 


5 


. 3026 


.2209 


.1843 


. 1637 


. 1259 


.1090 


10 


.2988 


.2180 


.1808 


. 1599 


.1217 


.1045 
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TABLE 4 



LINEAR DISCRIMINANT FUNCTION 
ERROR PROBABILITIES 







FOR 


GAMMA 


POPULATIONS 


•• ■- 


> / . 

/ ■, ' 








R 


= 7 








N\£ 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 


.5000 


.2268 


.0821 


.0319 


.0136 


.0005 


.0000 


2 


.5000 


.1642 


.0414 


.0111 


.0032 


.0000 


.0000 


5 


.5000 


.1214 


.0237 


.0042 


.0007 


.0000 


.0000 


10 


.5000 


.1111 


.0188 


. 0026 


.0003 


.0000 


.0000 

i 


\ 






R 


= 8 




y 


% • 


N \C 


1.0 


2.0 


3.0 


4,0 


5.0 


10.0 


20.0 


1 


.5000 


.2077 


.0664 


.0230 


.0088 


.0002 


.0000 


2 


.5000 


.1463 


.0319 


. 0074 


.0018 


.0000 


.0000 


5 


.5000 


.1078 


.0175 


. 0025 


.0003 


.0000 


.0000 


10 


.5000 


.0985 


.0136 


.0015 


.0001 


.0000 


.0000 



i 



TABLE 4 



LINEAR DISCRIMINANT FUNCTION 
ERROR PROBABILIT IES 
FOR GAMMA POPULATIONS 



N\C 


.5000 


.3333 


R = 7 

.2500 


. 2000 


.1000 


.0503 


1 


.3555 


.2387 


.1911 


.1682 


. 1322 


.1172 


2 


.3073 


.2087 


.1715 


. 1517 


.1163 


.1008 


5 


.2809 


.1979 


.1612 


. 1409 


. 1046 


.0888 


10 


.2775 


.1945 


.1574 


. 1 369 s v 


* 1003 


.0843 


^ • 





















70 

II 

OD 




, 




N\C 


.5000 


.3333 


.2500 


.2000 


.1000 


.0500 


1 


.3336 


.2159 


.1711 


. 1497 


.1155 


.1012 


2 


.2857 


.1890 


’ .1526 


. 1333 


.0992 


.0846 


5 


.2621 


.1781 


.1417 


. 1220 


.0874 


.0727 


10 


.2588 


.1744 


.1377 


. 1 178 


.0831 


.0684 
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TABLE 4 

LINEAR CISCRIMIMANT FUNCTION 
ERROR PROBABILITIES 
FOR GAMMA POPULATIONS 



R = 9 



; 1 J 



i / 

/ 

i / . 






N\C 


1.0 


2.0 


3.0 


1 


.5000 


.1904 


.0540 


2 


.5000 


.1308 


.0247 


5 


.5000 


.0961 


.0130 


10 


.5000 


.0875 


.0099 



4.0 


5.0 


10.0 


20.0 


0166 


.0057 


.0001 


.0000 


0049 


.0010 


.0000 


.0000 


0015 

\ 


.0002 


.0000 


.0000 


0009 


.0001 


.0000 


.0000 




I 









R 


= 10 




r 


» 


N\C 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 


.5000 


.1747 


.0440 


.0121 


.0037 


.0000 


.0000 


2 


.5000 


.1174 


.0193 


. 0033 


.0006 


.0000 


.0000 


5 


.5000 


.0859 


.0097 


. 0009 


.0001 


.0000 


.0000 
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1 


















TABLE. 4 






LINEAR DISCRIMINANT FUNCTION 







ERROR PROBAB I L IT I ES 
FOR GAMMA POPULATIONS 












R = 9 






n\c 


.5000 


.3333 


.2500 .2000 


.1000 


.0500 


1 


.3137 


.1965 


.1542 .1341 


.1014 


.0877 


2 


.2667 


.1720 


.1364 .1175 


.0850 


.0713 


5 


.2454 


.1609 


.1251 .1060 

\ 


.0734 


.0599 


10 


.2421 






o 











R = 10 




• 


> 


n\c 


.5000 


.3333 


.2500 


. 2000 


. 1000 


.0500 


1 


.2954 


.1798 


.1397 


. 1206 


.0893 


.0762 


2 


.2498 


.1571 


.1223 


. 1040 


.0730 


.0603 


5 


. .2305 


.1458 


.1108 


. 0925 


.0619 


.0495 


10 


.2270 


.1417 


.1066 


. 0883 


.0579 


.0457 
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1 









i I v » 

I 

. ! 

• t 

, * \ 

i 

TABLE 4 



! 

LINEAR DISCRIMINANT FUNCTION 
ERROR PROBABILITIES 
FOR GAMMA POPULATIONS 



N\C 


1.0 


2.0 


3.0 


1 


.5000 


.1605 


.0359 


2 


.5000 


.1056 


.0151 


5 


.5000 


.0770 


.0073 


10 


.5000 


.0694 


.0052 









R 


N\C 


1.0 


2.0 


3.0 


1 


.5000 


.1475 


.0294 


2 


.5000 


.0953 


.01 19 


5 


.5000 


.0691 


.0054 



4.0 


5.0 


10.0 


20.0 


0088 


.0025 


.0000 


.0000 


0022 


.0003 


.0000 


.0000 


0005 

\ 


.0000 


.0000 


.0000 


0003 

\ 


.0000 


.0003 

» 1 


.0000 

• f 
» i 

, l 


• 






• 



t » 



1 2 



4.0 


5.0 


10.0 


20.0 


0065 


.0016 


.0000 


.0000 


0015 


.0002 


.0000 


.0000 


0003 


.0000 


.0000 


.0000 
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TABLE 4 



LINEAR 0 1 SCR 1 M IN ANT FUNCTION 
ERROR PROBABILITIES 
FOR GAMMA POPULAf IONS 







• 


R = 11 






N\C 


.5000 


.3333 


.2500 


. 2000 


.1000 


1 


.2787 


.1652 


.1270 


. 1087 


.0788 


2 


.2348 


.1439 


.1099 


. 0923 


.0630 


5 


.2170 


.1324 


.0984 


. 0809 

\ 


.0523 








•> 













R = 12 






N\C 


.5000 


.3333 


.2500 


. 2000 


• 1000 


1 


.2633 


.1523 


.1159 


. 0983 


.0696 


2 


.2212 


.1321 


.0990 


.0821 


.0544 


5 


.2046 


.1205 


.0876 


.0710 


.1443 


0 


.2009 


.1163 


.0835 


.0670 


.0408 



t ' 




.0500 

.0664 

.0511 

.0410 



.0500 

.0580 

.0435 

.0341 

.0309 
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TABLE 4 



LINEAR DISCRIMINANT FUNCTION 
ERROR PROBABILITIES 
FOR GAMMA POPULATIONS / 

l * 









R 


N\C 


1.0 


O 

• 

C\i 


3.0 


1 


.5000 


.1358, 


.0242 


2 


.5000 


.0861 


.0094 


5 


.5000 


.0621 


.0041 


10 


.5000 


.0554 


.0028 









R 


N\C 


1.0 


ro 

• 

o 


3.0 


1 


.5000 


.1250 


.0199 


2 


.5000 


.0781 


.0074 


10 


.5000 


.0496 


.0021 



1 3 



4.0 


5.0 


10.0 


20.0 


0048 


.0011 


.0000 


.0000 


0010 


.0001 


.0000 


.0000 


0002 

\ 


.0000 


.0000 


.0000 


0001 


.0000 


.0000 


.0000 



1 4 



4.0 


5.0 


10.0 


20.0 


0035 


,.0007 


.0000 


.0000 


0007 


>0001 


.0000 


.0000 


0001 


.0000 


.0000 


.0000 



TABLE 4 



LINEAR DISCRIMINANT FUNCTION 
ERROR PROBABILIT IES 
FOR GAMMA POPULATIONS 



R = 13 



N\C 


.5000 


.3333 


.2500 


. 2000 


.1000 


.0500 


1 


.2492 


.1409 


.1060 


.0891 . 


.0617 


.0508 


2 


.2089 


.1215 


.0894 


.0731 


.0471 


.0371 



R * 14 



N\C 


.5000 


.3333 


.2500 


. 2000 


.1000 


.0500 


1 


.2361 


.1307 


.0971 


.0808 


.0547 


.0445 


2 


.1977 


.1119 


.0808 


. 0653 


.0408 


.0316 


5 


.1829 


.1004 


.0698 


. 0549 


.0320 


.0237 


10 


.1789 


.0962 






, 




* 




• 
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TABLE 4 



LINEAR CISCRIMIMANT FUNCTION 
ERROR PROBABILITIES 
FCR GAMMA POPULATIONS 



R = 1 5 



r 



n\c 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 


.5000 


.1152 


.0165 


.0026 


.0005 


.0000 

| 


.0000 

■ 


2 


.5000 


.0709 


.0059 


. 0005 


.0000 


.0000 


.0000 


0 


.5000 


.0504 


.0023 


. 0001 


.0000 


.0000 


.0000 



R = 1 6 



n\c 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 


.5000 


.1063 


.0136 


. 0020 


.0003 


.0000 


.0000 


K. 

2 


.5000 


.0645 


.0047 


. 0003 


.0000 


.0000 


.0000 
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TABLE 4 







LINEAR DISCRIMINANT FUNCTION 
ERROR PROBABILIT IES 
FOR GAMMA POPULATIONS 










R = 15 








n\c 


.5000 


.3333 


.2500 


2000 


.1000 


.0500 


1 


.2241 


.1214 


.0891 


0734 


.0486 


.0391 


2 


.1875 


.1033 


.0731 


0583 


.0355 


.0271 



R = 16 



n\c 

1 

2 



5000 


.3333 


.2500 


.2000 


. 1000 


.0500 

7 


2129 


.1131 


.0819 


. 0668 


.0433 


.0343 


1780 


.0954 


.0663 


. 0522 


.0309 


.0232 



TABLE 4 



LINEAR CISCRIMIN ANT FUNCTION 
ERRCR PROBABILITIES 
FOR GAMMA POPULATIONS 







R = 


17 




«• 




1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


5000 


.0981 


.0113 


. 0015 


.0002 


.0000 


.0000 


5000 


.0587 


.0037 


.0002 


.0000 


. .0000 


.0000 




i 

R = 18 



1.0 


2.0 


3.0 


4.0 


5.0 


10.0; 


20.0 


5000 


.0906 


.0094 


.0011 


.0001 


.0000 


.occo 


5000 


.0536 


.0029 


. OOQ 1 


.0000 


.0000 


.0000 



\ 




t 
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TABLE 4 



LINEAR DISCRIMINANT FUNCTION 
ERROR PROBABILITIES 
FOR GAMMA POPULATIONS 



R = 17 



! / 



/V 



1 

2 



5000 


.3333 


..2500 


. 20C0 


.1000 


.0500 


2026 


.1054 


.0753 


. 0609 


.0385 


.0302 


1693 


.0882 


.0601 


. 0468 


.0269 


.0199 



R = 18 



, ' 



n\c 


.5000 


.3333 


.2500 


.2000 


. 1000 


.0500, 


1 


.1929 


.0985 


.0693 


. 0555 


.0344 


.0266 


2 


.1612 


.0816 


.0546 


. 0419 


.0234 


*0171 
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I 



TABLE 4 

LINEAR DISCRIMINANT FUNCTION 
ERROR PROBAB ILI TIES 







FOR 


GAMMA 


POPULAT IONS 




/ 

/ 

" i 








R 


= 19 






y 

/ ' 


r\c 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0 


20.0 


1 


.5000 


.0838 


.0078 


. 0008 


.0001 


.0000 


.0000 


2 


.5000 


.0489 


.0023 


. 0001 


.0000 


.0000 


.0000 



R = 20 



n\c 


1.0 


2.0 


3.0 


4.0 


5.0 


10.0, 


20.0 


1 


.5000 


.0775. 


.0065 


. 0006 


.0001 


.0000 


.0000 


2 


.5000 


.0448 


.0019 


. 0001 


.0000 


.0000 


.0000 



t 
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TABLE L 



LINEAR 0 ISCR IMINANT FUNCTION 







ERROR PROBABILIT IES 
FOR GAMMA POPULATIONS 




• 








R = 19 






4 \ \ * 


n\c 


.5000 


.3333 


.2500 


. 2000 


.1000 


.0500 


1 


.1840 


.0920 


.0639 


. 0506 


.0307 


.0235 


2 


.1536 


.0756. 


.0496 


. 0376 

\ 


.0205 


.0147 









R = 20 








n\c 


.5000 


.3333 


.2500 


. 2000 


. 1000 


.0500 


1 


.1756 


.0861 


.0590 


.0462 


.0274 


.0207 


2 


.1465 


.0701 


.0452 


. 0338 


.0179 


.0126 



SECTION IV 



SUMMARY AND CONCLUSION 

Section II of this paper briefly summarizes some 
of the work accomplished by Hodges and Fix in [3]. 

Their investigation was concerned with the computation 
of the probabilities of misclassif ication for various 
nonparametric procedures assuming some parametric form 
of the distribution which describes the populations. The 
error probabilities for the "optimum" parametric procedure 
were also computed and compared with the nonparametric 
error probabilities. The investigation considered the 
two population classification problem when the popula- 
tions have normal distributions with equal covariance 
matrices. The parametric procedure employed was the 
linear discriminant function which is the appropriate 
method in this situation, and the primary nonparametric 
procedure considered was the "rule of the nearest neigh- 
bor." The above two procedures were compared by computing 
the probabilities of misclassif ication . The results of 
this investigation indicated that the "rule of nearest 
neighbor" gave "reasonable” error probabilities. 

Section III also considers the two population 
classification problem, but the investigation is primarily 
concerned with the performance of the linear discriminant 
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function if the actual densities which describe the 
populations are not normal, but in fact gamma with 
density functions defined by formulas (5) and (6) of 
Section III. Also included in Section III is a 
limited investigation of the "rule of nearest neighbor" 
when the populations are assumed to be exponential. 
Evaluation of the performance of both the linear discri- 
minant function and the "rule of nearest neighbor” 
was accomplished by computation of the probabilities of 
misclassification. 

When the population densities are assumed to be 
exponential, Table 3 and Table 4, for the case r = 1, 
provide a means of comparing the performance of the 
linear discriminant function and the "rule of nearest 
neighbor." An examination of these tables indicates that 
both procedures can result in "high" probabilities of 
error, particularly when c assumes values near one, 
since for small sample sizes, both procedures can result 
in error probabilities which are greater than f- . 

Although even as n, the sample size from each popula- 
tion, tends to infinity, the linear discriminant func- 
tion has error probabilities greater than for 
[2(ln2)-ll < c < 1, it is of interest to note that 
"the rule of nearest neighbor” in this situation will 
always have error probabilities less than or equal to 
•§-. Also, depending upon the importance of each type of 
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error, it is possible for the linear discriminant func- 
tion to be a "fairly useful" procedure since one error 
probability is usually "small." Table 4 also shows 
that as r increases, the probabilities of misclassifi- 
cation decrease. This result was anticipated since for 
increasing r, the gamma distribution approaches the 
normal .distribution by the Central Limit Theorem. 

The following recommendations are made on the basis 
of this pape r . 

(i) Investigate the performance of the nonparametric 
procedure, using k = 3 instead of the "rule of 
nearest neighbor,'' k = 1. 

(ii) Investigate the performance of the nonpara- 
metric procedures proposed by Hodges and Fix 
in [2"l employing different distance functions. 

(iii) Develop a more satisfactory computational 
formula for the linear discriminant function 
when the populations are assumed to be gamma 
in the situation when r and n are large since 
the formula used in this paper required many 
hours of computer time. 

(iv) Investigate the performance of the linear 
discriminant function and other nonparametric 
procedures for other distributions. A cursory 
investigation was made for the beta distribu- 
tion and the analysis appears to be more diffi- 



cult . 
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(v) Compare the performance of Bayesian parametric 
and nonpar ametric classification procedures. 

(vi) Investigate the classification problem when 
there are more than two populations. 
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